Precursors of extreme increments 
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We investigate precursors and predictability of extreme increments in a time series. The events 
we are focusing on consist in large increments within successive time steps. We are especially 
interested in understanding how the quality of the predictions depends on the strategy to choose 
precursors, on the size of the event and on the correlation strength. We study the prediction of 
extreme increments analytically in an AR(1) process, and numerically in wind speed recordings 
and long-range correlated ARMA data. We evaluate the success of predictions via receiver operator 
characteristics (ROC-curves). Furthermore, we observe an increase of the quality of predictions with 
increasing event size and with decreasing correlation in all examples. Both effects can be understood 
by using the likelihood ratio as a summary index for smooth ROC-curves. 
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I. INTRODUCTION 

Extreme value statistics m is a well established ap- 
proach to predict the relative frequency of rare extreme 
events, but does not include forecasts of when the next 
event will occur. There have been many attempts to em- 
ploy time series strategies for the latter purpose. These 
strategies usually investigate a record of historical data 
about the phenomenon under study and try to infer 
knowledge about the future. A standard approach is to 
search for precursors, i.e., typical signatures preceeding 
an extreme event. Such precursors have been discussed, 
e.g., in the literature about earthquakes l3|, epileptic 
seizures Q , and stock market crashes Q, |M P ■ As the 
above listed examples illustrate, the definitions of what 
an extreme event is depends on the context. Frequently, 
one encounters extremely large values of some observ- 
able, or some drastic changes. It is the latter which is 
the focus of this paper where we discuss large increments 
motivated by stock markets or by turbulent gust in wind 
speed data. 

One might expect that the more extreme an event is, 
the more difficult it is to predict it, simply because more 
extreme events are usually also much rarer. However, it 
has been reported in the literature of wind speed predic- 
tions 0, precipitation forecast [i3, multi agent games 
|11| and earthquakes [IJ that more extreme events are 
better predictable than small events. Therefore one par- 
ticular goal of this contribution is to investigate how the 
predictability of large increments depends on the size of 
the increment. 

In this contribution we study predictions in a simple 
autoregressive process of order 1 (AR(1) process) |2i an- 
alytically in order to obtain a detailed understanding 
of some questions on precursors and predictions. The 
AR(1) process is a simple stationary stochastic model 
process, that might not reflect all features of more com- 
plex processes occurring in nature, but it admits a fully 



analytic treatment. Additionally, we study similar pre- 
diction procedures numerically in long-range correlated 
data and in wind speed data, verifying the same quanti- 
tative results. The questions, which we intend to answer 
are the following: 

(Ql) How to choose a precursor in order to obtain good 
predictions? 

(Q2) Are extreme increments the better predictable, the 
more extreme they are? 

(Q3) How does the correlation of the data influence the 
predictability of extreme increments? 

The paper is organized as follows. In Sec. Ill Al we dis- 
cuss two strategies which can be used to choose precur- 
sory structures and in Sec. lHBl we introduce a method to 
evaluate the predictive power of precursors. The extreme 
events we dicuss in this contribution are defined in Sec. 
Ill CI and we show how to obtain their joint PDFs analyt- 
ically in Sec. Ill Dl We apply these procedures to AR(1) 
correlated stochastic processes in Sec. IIIII to wind speed 
measurements in Sec. IIVI and to long-range correlated 
data Sec. El Conclusions appear in Sec. lVIl 



II. DEFINITIONS AND SET-UP 

The considerations in this introductory section are 
made for general dynamical systems with a complex time 
evolution. They might be purely deterministic, then 
high-dimensional and chaotic, or they might be stochas- 
tic. In any case we assume that the time evolution of 
the system cannot be easily modeled and hence one tries 
to extract information about the future from time series 
data. This means that through some experimental ob- 
servation one can record a usually univariate time series, 
i.e., a set of measurements Xn at discrete times i„, where 
tn = to + nlS. with a sampling interval A. The recording 



should contain sufficiently many extreme events so that 
we are able to extract statistical information about them. 
We also assume that the event of interest can be identi- 
fied on the basis of the observations, e.g., by the value of 
the observation function exceeding some threshold, by a 
sudden increase, or by its variance exceeding again some 
threshold. 



sors in a retrospective or a posteriori way: once 
the extreme event X has been identified, one 
asks for the signals right before it. Formally, 
this implies that the precursory structure con- 
sists of the global maxima in each component 
«-fc+i'<-fc+2'--'<-^i'0 of the a posteriori 
PDF. 



A. The choice of the precursor 

Ideally, a precursor is a typical signature in the data 
preceeding every individual event. Unfortunately the 
time evolution of most systems is usually too irregular 
to demand this, so one would call a precursor a data 
structure which is typically preceeding an event, allowing 
deviations from the given structure, but also allowing 
events without preceeding structure. This interpretation 
of a precursor allows to determine the specific values of 
the precursory structure by statistical considerations. 

In order to predict an event occurring at the time 
{n + 1) we compare the last k observations X(„ j.) ~ 
{xn-k+i,Xn-k+2, ■■■iXn-i,Xn) with a specific precursory 
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This precursory structure can be chosen according to 
different strategies. The two possible strategies which 
we address here, represent the most fundamental choices. 
They consist in using either the maximum of the a pos- 
teriori PDF or of the maximum of the likelihood 14|. 
In more applied examples one looks for precursors which 
minimize or maximize more sophisticated quantities, e.g., 
discriminant functions or loss matrices. These quanti- 
ties are usually functions of the posterior PDF or the 
likelihood, but they take into account the additional de- 
mands of the specific problem, e.g., minimizing the loss 
due to a false prediction [l^- The two strategies stud- 
ied in this contribution are thus fundamental in the sense 
that they enter into most of the more sophisticated quan- 
tities which are used for predictions and decision making. 

The a posteriori PDF p(x(„ji.)|X) takes into account 
all events of size X and provides the probability density 
to find a specific precursory structure before an observed 
event. 

(I) Hence strategy I consists in defining the precur- 



The likelihood p{X\-x.(^n,k)) takes into account all pos- 
sible values of precursory structures, and provides the 
probability density that an event of size X will follow 
them. Note that the likelihood is thus not a density func- 
tion with respect to the precursory structure, but with 
respect to the event size X. The precursory structure en- 
ters into the likelihood only as a parameter. 

(II) Strategy II consists in determining those values of 
each component Xi of the condition x/n,k) for which 
the likelihood has a global maximum. 

Note that the a posterior PDF and the likelihood are 
linked via Bayes's theorem 

P(x(„^fe),X) = p(x(„^fe))p(X|x(„^fc)) =p(X(„^fc)|X)p(X), 

where p(x(„fc)) represents the marginal PDF to find 
the precursory structure X(„ fc) and p{X) represents the 
marginal PDF to find events of size X. 

In summary the possible values of precursors are given 
by 
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where x* are the points in which p{x/ni^\ \X) has a global 

maximum and xl are the points in which p{X\xrn,k)) has 
its largest maximum, with n — k + l<i<n. In both 
cases the event size X is assumed to be fixed. Once the 
precursory structure Xp^e is determined, we give an alarm 
for an extreme event when we find the last k observations 
X(n,fe) i^^ the volume 
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(2) 



This method of determining the precursor is especially 
useful if the PDF of a process has one clearly defined 
maximum. For multimodal PDFs the strategy of using 
only the global maxima can surely be improved by con- 



sidering also the influence of smaller maxima of the PDF. 
In this case the precursory volume could, e.g., consist of 
X(ri,fc) for which the PDFs have values above a certain 
threshold. In this case Vpre{6) might not be simple con- 



nected, but apart from this the procedure of predicting 
should not be different. However, we restrict ourselves to 
unimodal PDFs in this contribution. 



B. Testing for predictive power 

A common method to verify a hypothesis or test the 
quality of a prediction is the receiver operating charac- 
teristic curve (ROC-plot) [llll^. The idea of the ROC- 
curve consists simply in comparing the rate of correctly 
predicted events r^ with the rate of false alarms rf by 
plotting re vs. rf. The resulting curve in the unit-square 
of the rf-rc plane approaches the origin for (5 — > and 
the point (1,1) in the limit (5 — > cx), where 5 accounts for 
the size of the precursor volume Vpre{5) (see Eq. ^). 

The shape of the curve characterizes the significance 
of the prediction. A curve above the diagonal reveals 
that the corresponding strategy of prediction is better 
than a random prediction which is characterized by the 
diagonal. Furthermore we are interested in curves which 
converge as fast as possible to re = 1, since this scenario 
tells us that we reach the highest possible rate of correct 
prediction without having a large rate of false alarms. 

There are various so called summary indices [18] which 
quantify the behavior of the ROC. In this contribution we 
use the so called likelihood ratio J17l | in order to quantify 
the ROC-curve. The likelihood ratio is identical to the 
slope m of the ROC-curve. For the usage as a summary 
index, we consider the slope in the vicinity of the origin 
which implies 5 — > 0. 

The term likelihood ratio results from signal detection 
theory in which context the term "a posteriori PDF" 
refers to the PDF which we call likelihood in the context 
of predictions, and vice versa. This is due to the fact 
that the aim of signal detection is to identify a signal 
which was already observed in the past, whereas predic- 
tions are made about future events. Thus the "likelihood 
ratio" is in our case in fact a ratio of the posterior PDFs, 
as defined by 



Arc p{y^{n,k)\X) 



^'^f Pi^in,k)\X) 



-0{5), 



(3) 



<5«0 



where /9(x(„jt)|X) denotes the a posterior PDF for non- 
events. However, we will use the common name likeli- 
hood ratio throughout the text. 

The likelihood ratio can be expressed in terms of the 
likelihood p(X|x(„ fc)) and the total probability to find 
events p(^) 



TO(x(„,fc),X) 



1-pWj p(X|x(„,fc)) 

^W (l-p(X|x(„,,.))) 



(4) 



If we assume that the events we are observing are quite 
rare and hence p(X), p(X|x(„j;)) ^ 1, the likelihood ra- 
tio is approximately given by 
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Eq. |S1 already suggest an answers to questions (Ql) and 
(Q2), by considering to(X(-„ ,t-),X) as a summary index. 
ad (Ql): This asymptotic form of the likelihood ratio 
allows us to compare different strategies of prediction. 
Looking for the maximum of p{yiin,k)\X) in X(„j,), ac- 
cording to strategy I, there is always the influence of the 
denominator p(x(„_fc)) which will keep the likelihood ratio 
small, even if p{yiin,k)\X) in yiin.k) is maximized. This is 
due to the fact that p(x(„ ^^jX) cannot be large without 
p(x(„jt)) being large. Strategy II, which uses the max- 
imum of p(X|x(„ ^j-)) in x/„ /j) should thus be superior, 
since the denominator p{X) is independent of the chosen 
precursor. The examples which are studied in Sec. IIIII 
Sec. lIVI and Sec^support this idea. 

ad (Q2): According to Eq. ^, the likelihood ratio 
is larger than unity, if p(x(„^fc),X) > p(x(„^fc))p(X), i.e.. 



if 



(Tl,fc) 



and X are correlated. This condition can be 



also written as p{X\x.(^n,k)) > p{X) or as p{K(^n,k)\X) > 
p(x(n,fc)) using Bayes's theorem. The latter expression 
states that the a posteriori PDF p(x(„fc)|A'), i.e., the 
probability to find the precursor prior to an event should 
be larger than the probability to find the precursor prior 
to an arbitrary value. Thus, the condition is fulfilled by 
choosing the precursor in a reasonable way, e.g., using 
the maximum of /9(x(„ ^^JA") in X(„fc) or the maximum 

of /3(x(„^fc)|X). 



C. Deflnition of Extreme Increments 

In this contribution we will concentrate on extreme 
events which consist in a sudden increase (or decrease) of 
the observed variable within a few time steps. Examples 
of this kind of extreme events are the increases in wind 
speed in |9|,|l9|, but also stock market crashes [J,|3 which 
consist in sudden decreases. 

We define our extreme event by an increment Xn+i~Xn 
exceeding a given threshold d 



a;„+i 



>d, 



(6) 



where Xn and Xn+i denote the observed values at two 
consecutive time steps. 



D. Obtaining the analytic expression of the 
posterior PDFs 

A mathematical expression for a filter, which selects 
the PDF of our extreme events out of the PDFs of the 
underlying stochastic process can be obtained through 
the Heaviside function 0(2;„+i — x„ — d). This filter is 
then applied to the joint PDF of a stochastic process. 



Since only the time steps {Xn, Xn+i) are of relevance for 
the filtering, we can neglect all previous time steps and 
apply the filter simply to the joint PDF for (a;„,a;„+i), 
which has then the form p{xn,Xn+i) — p{xn)p{xn+i\xn) 
This implies that we can regard all previous time-steps 
a;o,xi, ...,a;„_i, on which pn and Pn+i might depend, as 
parameters. 

The joint PDF of the extreme events p^{xn+i,Xn,d) 
can then be obtained by multiplication with <d{xn+i — 
Xn — d). If the resulting expression is non zero, the con- 
dition of the extreme event © is fulfilled and for a;„_|_i 
and Xn the following relation holds: 



,+1 = x„ + d + -/ (7eK,7>0) 



(7) 



Hence it is possible to express the joint probability den- 
sity in terms of Xn or Xn+i with the new random vari- 
able 7. We can then use the integral representation of 
the Heaviside function with appropriate substitutions to 
obtain: 



f {Xn+l,Xn,d) = p{Xn) p{Xn + d + ^\Xn) 

Jo 

S{{Xn+l - Xn -d) ~-f) d-f. (8) 

By normalizing this expression with the total probability 
P0{d) to find extreme events of size d or larger we obtain 
the joint PDF p {xn,Xn+i,d) of all values of Xn and 
a;„+i which are part of an extreme event. Integrating the 
resulting joint PDF p®(a;„, x„+i, d) over Xn+i we find the 
following expression for the marginal distribution, i.e., 
the a posteriori PDF: 



p{Xn\X{d)) 



PJXn) 

pe{d) 



d-y p{xn -|-d-|-7|a::„). 



(9) 



Analogously p{xn\X{d)) denotes the a posteriori PDF 
to observe the value Xn before an non-event, i.e., before 
an increment which is smaller than d. 



piXnlXid)) = j^ 



p{Xn) 



dx 



n+l 



1 



PB{d)) J-c 

e(a;„+i ~ Xn~ fi)jp„+i(a;„+i|a;„). 

(10) 

If for a given process the joint PDF of two consecu- 
tive events is known, we can hence analytically determine 
pixn\Xid)), p{xn\X{d)) and pe{d). 



III. EXTREME INCREMENTS IN THE AR(1) 
MODEL 



A. AR(1) model 

We assume that the time-series {a;„} is generated by 
an auto-regressive model of order 1 (AR(1)) (see e.g., [3) 
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FIG. 1: (Color online) Parts of the time series of the AR(1) 
process for different values of a. 
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(11) 



where ^„ are uncorrelated Gaussian random numbers 
with unit variance and —1 < a < lisa constant which 
represents the coupling strength. The size and the sign 
of the coupling strength sets whether successive values of 
Xn are clustered or spread, as illustrated in Fig. ^ 

In the case a = the process reduces to uncorrelated 
random numbers with mean p = and variance a^ = 1, 
whereas generally the process is exponentially correlated 
{xnXn+k) = fl*^ < 1 and has the marginal PDF 



p{Xn) 
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■ exp 
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(12) 



Since the size of the events is naturally measured in units 
of the standard deviation a{a) we introduce a new scaled 
variable ri = — ^ = d\/\ — a?. 

Applying the filter mechanism developed in Sec. IIIDI 
we obtain the following expressions for the posterior PDF 
of extreme events and the posterior PDF of non-extreme 
events 
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(14) 
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FIG. 2: (Color online) The a posteriori PDFs for the AR(1) 
process for different values of a < and rj. The vertical 
lines represent the means. The PDFs become asymmetric for 
a -^ —1. (For a = —0.99 and ?) — > oo the marginal PDFs 
becomes very flat and can hence not be distinguished from 
the X-axis in this figures). 



B. Determining the precursor value 



that the maximum of p(a:n|X(77), a) moves towards —oo 
vifith increasing size of r] and a — *■ 1. Although we can al- 
ways formally define the maximum xi and the mean (a;„) 
as precursor values, one can argue that the maximum of 
the distribution has no predictive power if a — *■ 1. Since 
the variance of the posterior PDF increases immensely 
in this limit, the value of p{xn\X(r\)^a) in its maximum 
does not considerably differ from the values in any other 
point. 

For large values of r\ we can also assume that the maxi- 
mum and the mean of p(xn\X[T\), a) nearly coincide, i.e.. 



XI 



-V 



2Vl^^ 1 



Ol^ 



(ry^cx)), (18) 



provided that p{xn\X{rj),a) is not too asymmetric (i.e., 
a is not close to —1). In the numerical tests in Sec. 
Mil CI we will hence use the mean of the posterior PDF 
as a precursor for strategy I, since it can be calculated 
explicitly by evaluating the corresponding integral. 

In order to determine xn, the precursor for strategy 
II, we have to find the maximum in Xr, of the likelihood 



Because of the Markov-property of the AR(1) model 
the probability for an event at time n + I depends only 
on the last value x„, hence k — Im Eq. ^. Thus, we give 
an alarm for an extreme event when an observed value Xn 
is in an interval Vr 



pre 



^pre 



6/2,Xpre + S/2]; around the 
precursor value Xpre- We compute the precursor values 
xj and xjj defined by Eq. ^ according to the strategies 
described in Sec. IIIAI 

The maximum xj of p{xn\X{ri), a) is given by the so- 
lution of the transcendental equation 
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(15) 



Inserting the asymptotic expansion for large arguments 
of the complementary error function 



exp(-z2) / 
erfc(z) ^ r= 1 
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-^ oo, |argz| < 
which can be found in |20| we obtain: 
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Fig.|21shows the posterior PDFs p(x„|A"(?7), a) according 
to Eq. I|13|l for different values of a and ry. One can see 



p{X{vi)\xn,a) = -crfc 



(1 - a)xn 
V2 



\/2\/l~^ 



(19) 



Since the complementary error function is a 
monotonously decreasing function of a;„ we see that we 
do not have a well defined maximum x//, ( we will thus 
denote xn : —00) and that the interval V- = (— (X),a;_] 
with the upper limit x- represents the interval for 
raising alarms according to strategy II. 



C. Testing the Performance of the Precursors 

In order to test for the predictive power of the pre- 
cursors specified above, we used two different methods 
to create ROC-curves (see Sec. IIIB|) . The first method 
consists in evaluating the integrals which lead to the rate 
of correct and false predictions 



rc{xpre,ri,S) = / dxn p{xn\X{ri),a), (20) 

JV(S) 



rf{xpre,riiS) = / dxn p{xn\X{ri),a). (21) 
Jv(S) 

The second method consists in simply performing predic- 
tions on a time series of 10^ AR(1) data and counting the 
number of extreme increments, which could be predicted 
by using the precursors specified above. For different val- 
ues of the correlation coefficients the data sets contained 
the following numbers of extreme increments: 
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In all cases, where the AR(1) correlated data sets contain 
increments, the empirically determined rates comply very 
well with the rates obtained via the evaluation of Eqs. 
(|20|l and (|21|l . For those values of a and 77, which were 
not accessible for the numerical test, we evaluated the 
integrals in Eqs. ^^ and (PT|l . 

In the numerical tests for both strategies and also for 
the evaluation of the integrals in Eqs. (|20|l and (|21|l ac- 
cording to strategy I, the size of the precursory volume 
ranged from 10~^ to 4, measured in size of the stan- 
dard deviation of the marginal PDF of the AR(1) process 
cr(a) = 1/Vl — a?- As precursors according strategy I we 
used the means of the a posteriori PDF. For the empiri- 
cally created ROC-plots according to strategy II we used 
the smallest values of the data sets as precursors. 

The evaluation of the integrals in Eqs. (|20|) and H21|l 
was done in a slightly different way for strategy II. Since 
there were no events in the data sets for certain value of a 
and d (as indicated in the table above), one could argue 
that the data sets also did not contain any precursor. 
From the previous section, we know that the theoreti- 
cal precursor value according to strategy II should be 
xji = — cx). Thus, we used a sufficiently small value as 
a precursor and adjusted the size of the prediction inter- 
val in order to capture all events. However, the resulting 
ROC-curves for strategy II coincided with the curves ob- 
tained empirically, as far as they were available. 

The resulting ROC-curves in Fig. O display the follow- 
ing properties: 

ad (Ql): The predictions according to strategy II 
are better than the predictions according to strategy I 
for all values of a and r/. 

ad (Q2): The ROC-curves display an increase of 
the quality of our prediction with increasing size of the 
events 77. 

ad (Q3): The ROC-curves in Fig. H show that the 
quality of the predictions increases with decreasing cor- 
relation strength a. Especially for a = 0, when the pre- 
dictions were made within completely uncorrelated ran- 
dom numbers, the ROC curves are far better than ROC 
curves for any random prediction. This is in agreement 
with results reported in |22| for the prediction of signs 
of increments in uncorrelated random numbers, i. e., the 
case (a = 0,77 = 0). 

Intuitively, the result for (Q3) can be understood eas- 
ily by considering that increments are not independent 
from the last observation. More precisely Xn+i — x„ = 
(a — l)x„ + S.n, so that the known part of the increment 
(a — l)a;„ is the larger, the smaller a. In other words: 
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FIG. 3: (Color online) The ROC-Curves made for the pre- 
cursors of strategy I and II. The lines represent the results 
of strategy I, the symbols correspond to predictions made ac- 
cording to strategy II. In both cases the predictions were made 
within 10^ AR(1)- correlated data. For the values of r; and a, 
where the data sets contained no increments, we created the 
ROC-curves by evaluating the integrals in Eqs. 121)1 and II2H . 



if we consider a very small value of x„ (small compared 
to the mean) in an uncorrelated process, the probability 
that the next value will be closer to the mean and hence 
lead to a large increment is high. Positive correlation 
hinders this effect, since it causes successive values to be 
closer to each other. 

A formal explanation of the results (Q1)-(Q3) 
is also given by an asymptotic expression for the 
slope m{a,ri,Xpre) in the following section. 



D. Analytical discussion of the Precursor 
Performance 

In this section, we will try to understand the effects 
shown by the ROC-curves in the previous section more 
detailed. Thus, we evaluate the asymptotic structure of 
the likelihood ratio as defined by Eq. ^ for different 
scenarios. 

In the case of the AR(1) process the slope of the ROC- 
curve in the vicinity of the origin is given by 



m{a,r],Xpre) 



Peiv) 



r{xpre,v), (22) 
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ad (Ql): We will first consider the behavior of the 
precursor according to strategy II. As we saw in Sec. 
nil Bl the optimal precursor value of strategy II is the 
limiting case xn = — 00. 

Since \un.Xp,.c^-ooi'{xpreT'ri) = 00 we find 
linij; ^^^_oo 'Ti(a, ?7, x//) = 00. Thus, we should ex- 
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FIG. 4: (Color online) p{xn\X{rj),a) and p{xn\X{r]),a) for 
a = —0.75. The maximum of the posterior PDF to ob- 
serve extreme events p{x„\X{ri),a) which is used as pre- 
cursor, moves towards — oo with increasing 77 since xi ~ 
—T]/{2y^l — a^)). Because the maximum of the failure pos- 
terior PDF p{xn\X {i]) , a) remains at the origin, the values 
of p{xn\X (rj) , a) which are observed at the precursor value 
xi decrease according to the decrease of p{xn\X{ri),a) as 



pect ROC-curves made with xn = — cx) to be tangent 
to the vertical axis of the curve and hence represent an 
ideal predictability for all sizes of events and all possible 
correlation strengths. However, for any finite precursor 
value of strategy I and strategy II we find non-ideal 
ROC-curves. 

Another way to understand the superiority of strategy 
II is to analyze the asymptotic behavior of the rate of 
correct predictions p(x„|X(ry), a) and the rate of false 
alarms, p{xn\X{rj),a) at the precursor value of strategy 
I. For the following calculations we use an approximation 
for the total probability to observe extreme events 
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which is derived in Appendix 1X1 

Inserting the asymptotic expression for pQ{r],a), the 
approximation of xi in Eq. HA3(I and the asymptotic ex- 
pansion of the complementary error function Eq. I|l()|) 
into Eqs. Ijl^l) and (|14ll . we find the following expressions 
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(77^00). 
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Hence the value of p{xn\X{ri),a) at the precursor value 
approaches a constant for large 77, whereas the values of 
p{xn\X{ri), a) decrease exponentially in this limit. Fig. 01 
illustrates this effect for the case a = —0.75. The maxi- 
mum of the failure PDF remains at the origin for rj —^ 00. 
Thus the values of this PDF which are observed at the 



decreasing precursor value xj oc 



"»? 



2vr^' 



decrease accord- 



ing to the shape of the distribution. This explains also 
the success of strategy II. Since the precursor value ob- 
tained by strategy II is the smallest possible value, strat- 
egy II seems to focus on the minimization of the failure 
rate. Note that by "minimization of the failure rate", 
we understand here a minimization of the integrand in 
Eq. (|21|l . while the alarm interval of size 6 remains con- 
stant. The fact that in this point the corresponding value 
of p{xn\X{ri),a) is also far away from the maximum of 
p{xn\X [r]) , a) does apparently not influence the outcome 
of the prediction. 

ad (Q2): In the following calculation we will ob- 
tain the asymptotic form of the likelihood ratio for large 
events. Inserting the asymptotic form of the probability 
pe('7i o,) provided by Eq. (jA4|) . and using the asymptotic 
expansion of the complementary error function in Eq. 
(|16|l . the likelihood ratio reads 
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Note that the limit z{r],a) -^ cx) corresponds to the limit 
77 —> 00 in the context of (Q2), but we can also interpret 
it as the limit a -^ ±1 in the context of (Q3) if 77 ^ 0. 
The expression in Eq. (|27|l tends to infinity in the limit 



00, if the argument of the exponential function in 



Eq. ^ 



f(xpre,a,ri) 
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(28) 



is positive. This is indeed the case for every precursor 
value Xpre < 0. Therefore, for both strategies of predic- 
tion, the sfope m{xpre, a, rj) increases as a squared expo- 
nential with increasing size of the events 77 according to 
Eq. (|27|l . Hence, the considerations of Sec. Ill Bl hold for 
our example, according to which an event is the better 
predictable the more rare it is. 

ad (Q3): One can also calculate the asymptotic be- 
havior of the likelihood ratio for a -^ ±1. The limit 
z(r],a) —^ cxD, which is relevant for the asymptotic form 
in Eq. (|27|l . can also be interpreted as the limit a ^ ± 
1. We assume that 77 is big enough, e.g., 1] > 2, such that 
Eq. HA4|I . which enters into Eq. (|27|l . is a useful approxi- 
mation. One can now discuss again the argument of the 
exponential function in Eq. (|28|l . 

Inserting the precursor of strategy I (as given by Eq. 

2 
ref), one obtains f{xi,a,7'i) = ^, hence 
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FIG. 5: (Color online) The bold lines show the dependence 
of the slope m{xi,a,ri) on the coupling strength according 
to Eq. I27II . The thin lines display the asymptotic behavior, 
given by Eq. 11291 . The constant lines represent the values, 
obtained from Eq. 1291 in the limit a ^ 1. Fig. (b) illustrates, 
that this asymptotic expression becomes better in the limit 
77 — + CX3, since in this limit the higher order terms in the 
approximation vanish even faster. 
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As a ^ 1, this expression converges to exp (77^/8). 
As a -^ —1, this expression approaches infinity as 
m{l,ri,xi) ^ l/Vl -I- a. Fig. [Sfa) illustrates this be- 
havior. Fig. IHl^b) shows that the asymptotic expression 
in Eq. I|29|l becomes better in the limit 7; — > 00, since in 
this limit the higher order terms of the approximation 
vanish even faster. 

For the theoretical precursor of strategy II xu = — cxd 
the slope would be independent of the value of the cou- 
pling strength if the exact precursor of strategy II could 
be used. For any real precursor value of strategy II 
XII = const. < 0, Eq. (PH| reads 
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This expression approaches a small negative value close 
to zero in the point a = 1. Hence, we find 77i(a, 77, xn) ~ 
1, as a ^ 1. 

In the limit a ^ — 1 and for any finite precursor value 
xii = const. < Eq. (|28|l reads 
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If the precursor is sufficiently small, e.g xn < 
— 77/(4\/l — a^), this expression is positive and hence 
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FIG. 6: (Color online) The asymptotic dependence of the 
slope m{xi,a, rj) on the coupling strength and the event size, 
if the precursor of strategy I is used. 



m(a, 77, a;//) — > 00, as a ^ —1. Hence, the asymptotic 
expressions of the likelihood ratio are able to describe 
the behavior of the ROC-curves, shown in the previous 
section. Fig. combines the dependence of the likeli- 
hood ratio on the event size and the correlation strength. 
One can see that the influence of the event size on the 
likelihood ratio is dominating, as long as one does not 
approach the singularity at a ^ —1. 



IV. APPLICATION: WIND SPEED 
MEASUREMENTS 

As an illustration of the proceeding considerations and 
also in order to demonstrate the usefulness of the bench- 
marks derived for AR(1) processes, we study here time 
series data of wind speed measurements. The data are 
recorded at 30m above ground by a cup anemometer with 
a sampling rate of 8 Hz in the Lammefjord site of the 



Ris0 research center 23^ . Wind speed data are evidently 
non-stationary and strongly correlated, so that, e.g the 
principle of persistence yields surprisingly accurate fore- 
casts: the very simple prediction scheme Xn+i — Xn is 
almost as accurate as an AR(20) model fitted on moving 
windows (in order to take non-stationarity into account) 
or order-10 Markov chains[i3- The amplitude of the 
fiuctuations around a time local mean value are propor- 
tional to this mean value, i.e., there is statistical evidence 
that the noise in this process is multiplicative. However, 
when subtracting the time local mean (more precisely, 
performing a high-pass filtering with a Gaussian kernel 
with a standard deviation of 75 time steps), we receive 
data for which it is reasonable to fit an AR(1) process. 
When doing so, we find a coefficient a « 0.94. 

Turbulent gusts, i.e., sudden increases of the wind 
speed, are relevant events, e.g for the save operation 
of wind turbines, for aircrafts during take-off and land- 
ing, and for all wind-driven sports activities. In previ- 
ous workj^ we were therefore concerned with their pre- 
diction, where we were studying the performance of a 
Markov chain model. Here, we will restrict ourselves 
to the simpler (and less appropriate) AR(l)-philosophy: 
The current state of the process generating the wind time 
series is assumed to be fully specified by the last obser- 
vation Xn, and the event is assumed to be characterized 
by the upward jump of the wind speed in a single time 
step by more than g m/s. 



A. Determining the precursor value 

If we extract from the data set all subsequences of 
data where such a jump is present, then we can, in 
principle, construct empirically the distribution p(a;„|g), 
which corresponds to p(x(„ jj-jlX) of strategy I. In Fig. 
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FIG. 7; The profiles obtained from the mean of p(xn+k\g) 
for gust events of amplitude g. Also shown is the theoretical 
profile for an AR(1) process with a = 0.94 

|7| we show instead the mean value of p{Xn+k\9) for 
k = —20, . . . , 20, i.e., we show the mean profile of gusts 
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FIG. 8; The profiles obtained from the maxima oi p[g\xnj^k) 
for gust events of amplitude g. Also shown is the theoretical 
profile for an AR(1) process with a = 0.94. 



of strength g. Otherwise said, this is an average of all 
those time series segments, which (in shifted time) ful- 
fill xi — xo > (?, so that the part of these segments with 
fc < is what one would call naively a precursor of a 
gust event. This has to be compared to the values Xn+k 
which we find when we focus on the maximum xn in a;„ 
oi p{g\xn) which corresponds to the conditional proba- 
bility p{X\xn) of strategy II. More specifically, in Fig. |S| 
we show the profiles {xn+k)\xn=xii , where xn is defined 
by p{g\xii) = maXx„- In even different words, the value 
plotted at fc = is the value Xn for which p(g\xn) is max- 
imal, and at the preceeding and succeeding time steps we 
show the average over all time series segments which ful- 
fill Xn = xji is some precision. These profiles differ from 
the precursors shown before, as we have to expect for an 
AR(l)-model: In a perfect AR(1) process, the precursors 
equivalent to those in Fig. [7| would show a jump larger 
than g from fc = to fc = 1, with xq = —xi, and with 
Xk = a^xo for fc < 0, and Xk = a^xi ior k > 1. For the 
same idealized process, one expects Fig.|Slto show curves 
given by Xk = a^'^^xn for all k. Evidently, the wind data 
show a qualitatively very similar behavior, whereas, how- 
ever, additional correlations are visible. 



B. Testing for predictive pow^er 

The ROC-curves for the two prediction strategies are 
shown in Fig. l^and lTUl As expected, the minimization 
of false alarms (strategy II) is here superior, as strategy 
I has no predictive power. The latter is consistent with 
the observed value a w 0.94 and the results for the AR(1) 
process. 

In order to compute the ROC-curves we use the follow- 
ing numerically expensive but theoretically best justified 
algorithm: In theory, we want to generate an alarm if 
the current observation a;„ lies in an interval V which is 
defined by the subset of the M. where either p{g\xn) or 
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FIG. 9: (Color online) The ROC curves using strategy I, 
exploiting p{xn\X) and maximizing the hit rate. Evidently, 
the rate of false alarms exceeds the hit rate. 
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FIG. 10: (Color online) The ROC-curves for the prediction of 
jumps of amplitude larger than g for the wind data. Strategy 
II exploits p{X\xn) which minimizes the false alarm rate and 
performs the better the larger g. 



p{xn\g) exceeds some threshold < Pc < i- We assume 
that both conditional PDFs are smooth in a;„. 

We can locally approximate p{g\x„) by searching all 
similar states Xj, with |x„ — Xj\ < e and counting the 
relative number of events in this set of states. When 
this number exceeds Pc, we give the alarm and can see 
whether it is a hit or a false alarm. 

In order to evaluate p(x„ \g) we first create the set of all 
states Xe which are preceeding an event, and then com- 
pute the fraction of these which is e-close to the current 
state Xn- Since this fraction evidently depends on the 
value of e, we should introduce a normalization. How- 
ever, in order to create the ROC statistics we just have 
to introduce a threshold which runs from to the largest 
value thus found. Both schemes can be straightforwardly 
generalized to situations where the current state of the 
process is defined by a sequence X(„ ^^-j of k past mea- 
surements {xn~k+i,Xn-k+2, ■ ■ ■ ,^n-i,Xn), e.g., for an 
AR(2) model k — 2, whereas in [3 we were using A: = 10 
for a Markov chain of order 10. 

Since the wind speed data are strongly correlated, a w 
0.94, it is not possible to predict the increments of the 
data sufficiently well. This corresponds to the previously 
derived results for the AR(1) model in the limit a — > 
1. However, we also find deviations from the theoretical 
ROC-curve for a = 0.94, which is additionally plotted in 
Figs. 1^ and [TUI These deviations show that the AR(1) 
model is not able to describe the wind data completely. 

The wind data also show the increase of predictability 
with increasing event size. This suggests that this effect 
is more general and not limited to the class of AR(1) 
models. Again, we also observe that strategy II is supe- 
rior to strategy I. 



V. EXTREME INCREMENTS IN 
LONG-RANGE CORRELATED PROCESSES 

We studied the same questions which are described 
before, in long-range correlated processes. Since the pre- 
cursors we were interested in live on a very short time 
scale (one step before the event), one should not expect 
long-range correlations to lead to qualitatively different 
results for the aspects we were interested in. The results 
obtained in this section support this assumption. 

There are various definitions of long-range correlation. 
Typically long-range correlation in a time series is char- 
acterized by the exponent < 7c < 1 of the power-law 
decay of the autocorrelation function as a function of the 
time t 
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> x^x. 



n+t 



i-^= (32) 



The correlation coefficient 7^ is controlling, how fast the 
correlations decay. 

We study the predictability of increments numerically 
by applying the prediction strategies described in Sec. 
IIIAI The data used for this numerical study were gen- 
erated as described in [2J| and used in|23: Imposing a 
power-law decay on the Fourier spectrum. 



/.(fc)«A:-^ 



(33) 



with < /3 < 0.5 and choosing phase angles at random 
one obtains through an inverse Fourier transform the 
long-range correlated time series in x with 7c = 1 — 2/3. 
The data are Gaussian distributed with (a;) = 0, cr = 1. 
Having specified the power spectrum or, correspondingly, 
the autocorrelation function for sequences of Gaussian 
random numbers means to have fixed all parameters of 
a linear stochastic process. Hence, in principle the co- 
efficients of an autoregressive or moving average process 
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FIG. 11: (Color online) ROC-curves for the ARMA(oo,cx)) 
processes with 7c = 0.2 and7c = 0.8. 



can be uniquely determined, where, due to the power-law 
nature of the spectrum and autocorrelation function the 
order of either of these models have to be infinite |3, 13 ■ 
Thus, the effects which we observed for this ARMA((X), 
oo) model should be valid for the whole class of linear 
long-term correlated processes. The ROC-curves in Fig. 
111! which are generated from the long-range correlated 
data are very similar to the ones for the AR(1) process 
in terms of the question we want to study. 

ad (Ql): The ROC-curves obtained by using strat- 
egy II are superior to the curves resulting from strategy 
I. 

ad (Q2) and (Q3): The quality of the prediction 
also increases with increasing event size and decreasing 
correlation. 

Hence we observe the same effects which we described 
before for the AR(1) process and the wind speed data in 
a long range correlated ARMA(cx3, cxj) process. 



VI. CONCLUSIONS 

We studied the predictability of extreme increments 
in an AR(1) correlated process, in wind speed data and 
in a long-range correlated ARM A process. To measure 
the quality of the prediction we used the ROC-curve and 
additionally the slope of the ROC-curve in the vicinity of 
the origin as a summary index. This so called likelihood 
ratio, characterizes particularly the behavior in the limit 
of low false-alarm rates. 

In the case of the AR(1) process we could construct 
the posterior PDF and the likelihood analytically from 
a given joint PDF and hence we were able to obtain the 
asymptotic behavior of the likelihood ratio analytically. 
In the case of the two other examples, we constructed the 
posterior PDFs numerically. The resulting distributions 
were then used to determine precursors according to two 
different strategies of prediction. 

In all examples we studied the aspects : (Ql) Which is 



the best strategy to choose precursors? (Q2) How does 
the predictability depend on the event size? (Q3) And 
how does the predictability depend on the correlation? 
The results can be summarized as follows: 

ad (Ql): Strategy I, the a posteriori approach, max- 
imizes the rate of correct predictions, while strategy II 
focuses on the minimization of the rate of false alarms. 
For the example of the AR(1) process one can show that 
strategy II is the optimal strategy to make predictions. 
For other stochastic processes, it is not in general clear 
which of the two strategies leads to a better predictabil- 
ity. However, the application to the prediction of wind 
speeds and the numerical study within long-range cor- 
related data reveals that also for these examples better 
results are obtained by predicting according to strategy 
H. 

ad (Q2): For all examples studied, we observe an in- 
crease of predictability with increasing size of the events. 
This phenomenon which is also reported in the literature 
0, llfl [13 1 can be better studied by investigating the 
asymptotic behavior of our summary index. In the case 
of the AR(1) process we showed explicitly that the like- 
lihood ratio increases as a squared exponential with in- 
creasing event size. In Sec. lHBl we discussed for a general 
stochastic process that this effect appears, if the PDFs 
of the studied process fulfill certain conditions. 

ad (Q3): For the AR(1) process and the long-range 
correlated data we observe that the correlation of the 
data is inversely proportional to the quality of the pre- 
dictions. The ROC-curves for the wind data, which we 
assume to be a strongly correlated AR(1) process with 
correlation strength a = 0.94, display also a bad pre- 
dictability. This effect is due to the special definition of 
the events as increments. The asymptotic expression for 
the likelihood ratio in Eq. H27|l provides us also with a 
formally understanding of the a-dependence. 

All the considerations made in this contribution are 
made for a very simple but general method. In order 
to make predictions, we use the largest maximum of 
the a posterior PDF or the likelihood. For multimodal 
distributions, one can think about more sophisticated 
methods, which take into account also other maxima 
of the distribution. Furthermore, we investigate only 
stationary processes in these contributions. It remains 
to be studied, whether the answers, obtained to the 
questions (Q1)-(Q3) are also valid for non-stationary 
processes or multimodal distributions. 
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APPENDIX A: OBTAINING AN ASYMTOTIC 

FORM OF THE TOTAL PROBABILITY TO FIND 

INCREMENTS OF SIZE r? 

The total probability peii], a) to find increments of size 
r] can be obtained by integrating the pre-form of the pos- 
terior probability in Eq.|S| For the example of the AR(1) 
process the corresponding integral reads 
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In the special case rj = one can find the analytical form 
of the total probability pe(0, a) using again an integral 
identity from 21]. The resulting value pQ{0,a) = 1/2 
corresponds to the intuitive expectation one would have, 
since for 77 = the condition of our extreme event is 
always fulfilled if Xn+i is larger than a;„. This special 
case of predicting the sign of increments in uncorrelated 
data is discussed in |22| • 

For 77 7^ 0, we can find an asymptotic form of the 
total probability PQii], a) via evaluating the mean of the 
posterior PDF. An analytic expression of the mean can 
be obtained using an integral representation from |2l| 
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For large values of 77 we can also assume that the maxi- 
mum and the mean of p{xn\X {r]) , a) nearly coincide, i.e.. 



XI 



2VT^^ii + o[^ 



(f?^oo). 



provided that p{xn\X(ri),a) is not too asymmetric (i.e., 
a is not close to —1). Using this approximation, we find 
the following asymptotic form of the total probability to 
find increments of size rj 
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APPENDIX B: TRANSFORMATION OF 

EXTREME INCREMENTS INTO EXTREME 

VALUES 

We show how to relate the results obtained us- 
ing the definition of extreme events as extreme incre- 



ments (xn+i — Xn > d, as in Eq. ©) to the case when 
extreme events are defined as extreme values {yn+i > d) 
which exceed a certain threshold d, for ARMA(p,q) pro- 
cesses. An ARMA(p,q) model is defined as Q 



$(B)x„ = 0{B)^n, 



where {^} correspond to white noise and 



(Bl) 



$(B) = l-$iB-$25^- 

0{B) = l + 0iB + e-2B^ + .. 



e,B\ 



with B^ Xn = Xn-j- Searching for extreme increments in 
a time series {a;} is equivalent to search for extreme values 
in the time series {y}, defined through the transformation 



2/71+1 ■^71+ 1 Xf-, 



(B2) 



Assuming that {x} is described by an ARMA(p,q) 
process defined by Eq. (|B1|) . and inserting Eq. IJB2|) 
in Eq. (|B1|I . one obtains that {y} is described by an 
ARMA(p,q+l) model with the following transformed co- 
efficients 
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'i-9i^i i = l,2,...g , 
', ■ (B3) 



Due to the transformation l)B2|l the precursory struc- 
ture equivalent to the one used in Sec. IIIII is obtained 
choosing ,2^ 



Vpr 



J=0 



Xo 



(B4) 



With this choice of precursory structure and the corre- 
sponding transformation of the process (Eq. l|B2|l ). the 
results obtained for extreme increments can be trans- 
fered to the case of extreme values. In particular, for 
the case of AR(1) processes (which corresponds to an 
ARMA(1,0)) discussed in Sec. IIIII all results are also 
valid for an ARMA(1,1) process with the precursor given 
by (|B4|) and events defined as extreme values. E.g the 
alarm strategies consist in this case in raising an alarm 
whenever ypre falls near the precursor values given in 
Eq. (UJ. 
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